Overview

Dataset Statistics

Number of Variables 15
Number of Rows 13689
Missing Cells 30
Missing Cells (%) 0.0%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 13.5 MB
Average Row Size in Memory 1.0 KB
Variable Types
  • Numerical: 1
  • Categorical: 14

Dataset Insights

raw_href has a high cardinality: 7288 distinct values High Cardinality
acc_rpt_num has a high cardinality: 7288 distinct values High Cardinality
acc_uuid has a high cardinality: 13689 distinct values High Cardinality
Name has a high cardinality: 11955 distinct values High Cardinality
Age has a high cardinality: 100 distinct values High Cardinality
Person City/State has a high cardinality: 2753 distinct values High Cardinality
Date has a high cardinality: 365 distinct values High Cardinality
Time has a high cardinality: 1190 distinct values High Cardinality
Crash County has a high cardinality: 115 distinct values High Cardinality
Crash Location has a high cardinality: 7190 distinct values High Cardinality
Report has constant value "View" Constant
raw_href has constant length 82 Constant Length
acc_uuid has constant length 36 Constant Length
Report has constant length 4 Constant Length
Date has constant length 10 Constant Length
Troop has constant length 1 Constant Length
acc_uuid has all distinct values Unique
  • 1
  • 2

Variables


index

numerical

Approximate Distinct Count 82
Approximate Unique (%) 0.6%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 219024
Mean 20.3409
Minimum 0
Maximum 81
Zeros 365
Zeros (%) 2.7%
Negatives 0
Negatives (%) 0.0%
  • index is skewed right (γ1 = 0.7113)

Quantile Statistics

Minimum 0
5-th Percentile 1
Q1 9
Median 18
Q3 30
95-th Percentile 46
Maximum 81
Range 81
IQR 21

Descriptive Statistics

Mean 20.3409
Standard Deviation 14.1465
Variance 200.1242
Sum 278446
Skewness 0.7113
Kurtosis 0.2037
Coefficient of Variation 0.6955
  • index is not normally distributed (p-value 0.0022000734967455183)
  • index has 110 outliers

raw_href

categorical

Approximate Distinct Count 7288
Approximate Unique (%) 53.2%
Missing 0
Missing (%) 0.0%
Memory Size 2012283

Length

Mean 82
Standard Deviation 0
Median 82
Minimum 82
Maximum 82

Sample

1st row https://www.mshp.d...
2nd row https://www.mshp.d...
3rd row https://www.mshp.d...
4th row https://www.mshp.d...
5th row https://www.mshp.d...

Letter

Count 793962
Lowercase Letter 602316
Space Separator 0
Uppercase Letter 191646
Dash Punctuation 6
Decimal Number 150570
  • raw_href contains many words: 7288 words
  • raw_href has words of constant length

acc_rpt_num

categorical

Approximate Distinct Count 7288
Approximate Unique (%) 53.2%
Missing 0
Missing (%) 0.0%
Memory Size 1013074

Length

Mean 9.0064
Standard Deviation 0.1132
Median 9
Minimum 9
Maximum 11

Sample

1st row 220114694
2nd row 220114685
3rd row 220114685
4th row 220114625
5th row 220114774

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 6
Decimal Number 123236
  • acc_rpt_num contains many words: 7288 words

acc_uuid

categorical

Approximate Distinct Count 13689
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Memory Size 1382589

Length

Mean 36
Standard Deviation 0
Median 36
Minimum 36
Maximum 36

Sample

1st row 00fec566-ab50-4afd...
2nd row 99d9e043-82bc-46a8...
3rd row 0c9d3e3c-2697-49cd...
4th row 0c8ff83c-4093-4bb5...
5th row bbe76c58-31cd-4e9f...

Letter

Count 160023
Lowercase Letter 160023
Space Separator 0
Uppercase Letter 0
Dash Punctuation 54756
Decimal Number 278025
  • acc_uuid contains many words: 13689 words
  • acc_uuid has words of constant length

Report

categorical

Approximate Distinct Count 1
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 944541

Length

Mean 4
Standard Deviation 0
Median 4
Minimum 4
Maximum 4

Sample

1st row View
2nd row View
3rd row View
4th row View
5th row View

Letter

Count 54756
Lowercase Letter 41067
Space Separator 0
Uppercase Letter 13689
Dash Punctuation 0
Decimal Number 0
  • Report has words of constant length

Name

categorical

Approximate Distinct Count 11955
Approximate Unique (%) 87.3%
Missing 0
Missing (%) 0.0%
Memory Size 1101837
  • The largest value (JUVENILE,) is over 19.82 times larger than the second largest value (UNKNOWN, UNKNOWN)

Length

Mean 15.4907
Standard Deviation 3.2861
Median 16
Minimum 7
Maximum 31

Sample

1st row ANDERSON, ROMAN V
2nd row HAWKINS, DARIAN T
3rd row WEBSTER, JAMES R
4th row TALBOT, DJ
5th row ALEXANDER, DEVON L

Letter

Count 174693
Lowercase Letter 0
Space Separator 23475
Uppercase Letter 174693
Dash Punctuation 140
Decimal Number 9
  • Name contains many words: 8992 words

Age

categorical

Approximate Distinct Count 100
Approximate Unique (%) 0.7%
Missing 0
Missing (%) 0.0%
Memory Size 916902

Length

Mean 1.9809
Standard Deviation 0.1968
Median 2
Minimum 1
Maximum 3

Sample

1st row 38
2nd row 48
3rd row 38
4th row 59
5th row 19

Letter

Count 411
Lowercase Letter 0
Space Separator 0
Uppercase Letter 411
Dash Punctuation 0
Decimal Number 26706

Person City/State

categorical

Approximate Distinct Count 2753
Approximate Unique (%) 20.2%
Missing 30
Missing (%) 0.2%
Memory Size 1062306

Length

Mean 12.7733
Standard Deviation 3.1224
Median 12
Minimum 2
Maximum 73

Sample

1st row HOWARDVILLE, MO
2nd row PECULIAR, MO
3rd row LATOUR, MO
4th row NORBORNE, MO
5th row CONWAY, MO

Letter

Count 143391
Lowercase Letter 0
Space Separator 17446
Uppercase Letter 143391
Dash Punctuation 2
Decimal Number 27
  • Person City/State contains many words: 2035 words
  • The largest value (mo) is over 10.49 times larger than the second largest value (st)

Personal Injury

categorical

Approximate Distinct Count 5
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 984975

Length

Mean 6.9538
Standard Deviation 1.6774
Median 7
Minimum 5
Maximum 9

Sample

1st row MINOR
2nd row MINOR
3rd row NO INJURY
4th row MODERATE
5th row SERIOUS

Letter

Count 91569
Lowercase Letter 0
Space Separator 3621
Uppercase Letter 91569
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (MINOR, NO INJURY) take over 50.0%

Safety Device

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 930419
  • The largest value (YES) is over 2.78 times larger than the second largest value (NO)

Length

Mean 2.9684
Standard Deviation 1.0309
Median 3
Minimum 2
Maximum 7

Sample

1st row NO
2nd row YES
3rd row YES
4th row YES
5th row NO

Letter

Count 40634
Lowercase Letter 0
Space Separator 0
Uppercase Letter 40634
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (YES, NO) take over 50.0%
  • The largest value (yes) is over 2.78 times larger than the second largest value (no)

Date

categorical

Approximate Distinct Count 365
Approximate Unique (%) 2.7%
Missing 0
Missing (%) 0.0%
Memory Size 1026675

Length

Mean 10
Standard Deviation 0
Median 10
Minimum 10
Maximum 10

Sample

1st row 03/06/2022
2nd row 03/06/2022
3rd row 03/06/2022
4th row 03/06/2022
5th row 03/06/2022

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 109512
  • Date has words of constant length

Time

categorical

Approximate Distinct Count 1190
Approximate Unique (%) 8.7%
Missing 0
Missing (%) 0.0%
Memory Size 975013

Length

Mean 6.226
Standard Deviation 0.4183
Median 6
Minimum 6
Maximum 7

Sample

1st row 9:24PM
2nd row 9:15PM
3rd row 9:15PM
4th row 8:18PM
5th row 8:15PM

Letter

Count 27378
Lowercase Letter 0
Space Separator 0
Uppercase Letter 27378
Dash Punctuation 0
Decimal Number 44161
  • Time contains many words: 1190 words

Crash County

categorical

Approximate Distinct Count 115
Approximate Unique (%) 0.8%
Missing 0
Missing (%) 0.0%
Memory Size 988696

Length

Mean 7.2256
Standard Deviation 2.1953
Median 7
Minimum 3
Maximum 14

Sample

1st row NEW MADRID
2nd row CASS
3rd row CASS
4th row RAY
5th row LACLEDE

Letter

Count 94416
Lowercase Letter 0
Space Separator 2417
Uppercase Letter 94416
Dash Punctuation 0
Decimal Number 0
  • The largest value (st) is over 1.99 times larger than the second largest value (louis)

Crash Location

categorical

Approximate Distinct Count 7190
Approximate Unique (%) 52.5%
Missing 0
Missing (%) 0.0%
Memory Size 1357895

Length

Mean 34.1961
Standard Deviation 10.431
Median 33
Minimum 6
Maximum 82

Sample

1st row SECOND STREET IN L...
2nd row I-49 AT PECULIAR W...
3rd row I-49 AT PECULIAR W...
4th row MO 10, 1087 FT EAS...
5th row HWY CC 2 MILES NOR...

Letter

Count 340548
Lowercase Letter 0
Space Separator 81133
Uppercase Letter 340548
Dash Punctuation 4172
Decimal Number 38715
  • Crash Location contains many words: 3910 words
  • The largest value (of) is over 2.33 times larger than the second largest value (at)

Troop

categorical

Approximate Distinct Count 9
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 903474

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row E
2nd row A
3rd row A
4th row A
5th row I

Letter

Count 13689
Lowercase Letter 0
Space Separator 0
Uppercase Letter 13689
Dash Punctuation 0
Decimal Number 0
  • Troop has words of constant length

Interactions

Correlations

Missing Values